Search CORE

39 research outputs found

Towards Text-to-SQL over Aggregate Tables

Author: Haofen Wang
Jun Ma
Kaibin Zhou
Shuqin Li
Zeyang Zhuang
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2023
Field of study

ABSTRACTText-to-SQL aims at translating textual questions into the corresponding SQL queries. Aggregate tables are widely created for high-frequent queries. Although text-to-SQL has emerged as an important task, recent studies paid little attention to the task over aggregate tables. The increased aggregate tables bring two challenges: (1) mapping of natural language questions and relational databases will suffer from more ambiguity, (2) modern models usually adopt self-attention mechanism to encode database schema and question. The mechanism is of quadratic time complexity, which will make inferring more time-consuming as input sequence length grows. In this paper, we introduce a novel approach named WAGG for text-to-SQL over aggregate tables. To effectively select among ambiguous items, we propose a relation selection mechanism for relation computing. To deal with high computation costs, we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate tables. We also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation, where extensive experiments show the effectiveness and efficiency of our proposed method with 4% increase of accuracy and 15% decrease of inference time w.r.t a strong baseline RAT-SQL

Directory of Open Access Journals

Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Author: Cimiano Philipp
Ioannidis Yannis E.
Lee Dik Lun
Ng Raymond T.
Rudolph Sebastian
Tran Duc Thanh
Wang Haofen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Tran DT, Wang H, Rudolph S, Cimiano P. Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data. In: Ioannidis YE, Lee DL, Ng RT, eds. Proceedings of the 25th International Conference on Data Engineering (ICDE’09). 2009: 405-416

CiteSeerX

Crossref

Publications at Bielefeld University

GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest

Author: Haofen Wang
Siqi Wang
Yun Xiong
Yunfan Gao
Publication venue: MDPI AG
Publication date: 01/12/2022
Field of study

Thanks to the development of geographic information technology, geospatial representation learning based on POIs (Point-of-Interest) has gained widespread attention in the past few years. POI is an important indicator to reflect urban socioeconomic activities, widely used to extract geospatial information. However, previous studies often focus on a specific area, such as a city or a district, and are designed only for particular tasks, such as land-use classification. On the other hand, large-scale pre-trained models (PTMs) have recently achieved impressive success and become a milestone in artificial intelligence (AI). Against this background, this study proposes the first large-scale pre-training geospatial representation learning model called GeoBERT. First, we collect about 17 million POIs in 30 cities across China to construct pre-training corpora, with 313 POI types as the tokens and the level-7 Geohash grids as the basic units. Second, we pre-train GeoEBRT to learn grid embedding in self-supervised learning by masking the POI type and then predicting. Third, under the paradigm of “pre-training + fine-tuning”, we design five practical downstream tasks. Experiments show that, with just one additional output layer fine-tuning, GeoBERT outperforms previous NLP methods (Word2vec, GloVe) used in geospatial representation learning by 9.21% on average in F1-score for classification tasks, such as store site recommendation and working/living area prediction. For regression tasks, such as POI number prediction, house price prediction, and passenger flow prediction, GeoBERT demonstrates greater performance improvements. The experiment results prove that pre-training on large-scale POI data can significantly improve the ability to extract geospatial information. In the discussion section, we provide a detailed analysis of what GeoBERT has learned from the perspective of attention mechanisms

Directory of Open Access Journals

GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest

Author: Haofen Wang
Siqi Wang
Yun Xiong
Yunfan Gao
Publication venue: 'MDPI AG'
Publication date: 16/12/2022
Field of study

Multidisciplinary Digital Publishing Institute